Enhancing Parallelism by Removing Cyclic Data Dependencies
نویسندگان
چکیده
The parallel execution of loop iterations often is inhibited by recurrence relations on scalar variables. Examples are the use of induction variables and recursive functions. Due to the cyclic dependence between the iterations, these loops have to be executed sequentially. A method is presented to convert a family of coupled linear recurrence relations into explicit functions of a loop index. When the cyclic dependency is the only factor preventing a parallel execution, the conversion eeectively removes the dependency and allows the loop to be executed in parallel. The technique is based on constructing and solving a set of coupled linear diierence equations at compile-time. The method is general for an arbitrary number of coupled scalar variables and can be implemented by a straightforward algorithm. Results show that the parallelism of several sequential EISPACK do-loops is signiicantly enhanced by the converting them into do-all loops.
منابع مشابه
Design and Implementation of an Audio Codec (AMR-WB) using Data Flow Programming Language CAL in the OpenDF Environment Design and Implementation of an Audio Codec (AMR-WB) using Data Flow Programming Language CAL in the OpenDF Environment
Over the last three decades, computer architects have been able to achieve an increase in performance for single processors by, e.g., increasing clock speed, introducing cache memories and using instruction level parallelism. However, because of power consumption and heat dissipation constraints, this trend is going to cease. In recent times, hardware engineers have instead moved to new chip ar...
متن کاملSIRA: Schedule Independent Register Allocation for Software Pipelining
The register allocation in loops is generally carried out after or during the software pipelining process. This is because doing the register allocation at first step without assuming a schedule lacks the information of interferences between values live ranges. The register allocator introduces extra false dependencies which reduces dramatically the original ILP (Instruction Level Parallelism)....
متن کاملPaRSEC: A programming paradigm exploiting heterogeneity for enhancing scalability
New HPC system designs with steeply escalating processor and core counts, burgeoning heterogeneity and accelerators, and increasingly unpredictable memory access times, call for one or more dramatically new programming paradigms. These new approaches must react and adapt quickly to unexpected contentions and delays, and they must provide the execution environment with sufficient intelligence an...
متن کاملImpact of Software Bypassing on Instruction Level Parallelism and Register File Traffic
Software bypassing is a technique that allows programmercontrolled direct transfer of results of computations to the operands of data dependent operations, possibly removing the need to store some values in general purpose registers, while reducing the number of reads from the register file. Software bypassing also improves instruction level parallelism by reducing the number of false dependenc...
متن کاملParallélisme des nids de boucles pour l'optimisation du temps d'exécution et de la taille du code. (Nested loop parallelism to optimize execution time and code size)
The real time implementation algorithms always include nested loops which require important execution times. Thus, several nested loop parallelism techniques have been proposed with the aim of decreasing their execution times. These techniques can be classified in terms of granularity, which are the iteration level parallelism and the instruction level parallelism. In the case of the instructio...
متن کامل